feat/social-games-elo-eval #342

Keyu-He · 2025-12-09T09:44:14Z

Closes #

📑 Description

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

with minor bugs, will fix in future iterations

contain minor bugs, will fix in future iterations

Fixes several bugs preventing custom models (via custom/model@url format) from working: - Fix parameter name in generate.py: api_base → base_url (line 257) - Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401) Now uses model_dict.get("evaluator", model_dict["env"]) - Add markdown code block stripping in PydanticOutputParser Many local LLMs wrap JSON in ```json...```, parser now handles this - Fix format_bad_output to support custom models Passes base_url/api_key through error recovery path Conditionally uses response_format (custom servers may not support it)

Merge branch 'fix/custom-model-support' into feature/social-game-support

…ility issues in the game Refactor SocialDeductionGame for real-time history and cleaner prompts - ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages. - SocialDeductionGame: - Disabled environment turn markers to avoid duplication. - Implemented real-time history appending via `recv_message` override and `agent_message_buffer`. - Populated `action_instruction` in `Observation` for dynamic prompt instructions. - Observation: Added `action_instruction` field. - generate.py: Added `fill_template` helper for partial string formatting. - LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`. - Werewolves: Updated config description to populate `{agent_names}` dynamically.

next step, change script_like to false, and fix the rest errors that may cause

…_Sell_custom_models.py

… server

previous commit reverted too much..

…ogging

Keyu-He and others added 24 commits September 21, 2025 01:10

werewolf game in progress

d389b6e

with minor bugs, will fix in future iterations

werewolf game in progress

8b8850d

contain minor bugs, will fix in future iterations

updated prompt

2cc3990

current progress

3a9f689

fix mypy errors

df62578

To run the local models

f482b60

Merge branch 'fix/custom-model-support' into feature/social-game-support

Design Social Game class, werewolf demo working in progress

b453633

Merge branch 'main' into feature/social-game-support

7de839b

update on the SocialGame class / SocialDeductionGame class

ff49e41

fix mypy errors

71711b6

debugging on the prompts

39bb4e3

werewolf game debug

ca835c3

next step, change script_like to false, and fix the rest errors that may cause

Refactor social_game.py and update werewolves example

c0f7866

Add Social Game Engine documentation

089b30c

Delete examples/experimental/negotiation_arena/NegotiationArena_1_Buy…

4ce9d6c

…_Sell_custom_models.py

Restore sotopia/cli/install/redis-data/dump.rdb to match origin/main

39f46cd

Revert unnessarily changes in the uniform_sample and server.py

aacd07a

Minor update on werewolf prompt, Compatibility on uniform sampler and…

f676238

… server

update uniform_sampler and server.py to the correct versions

67dc7db

previous commit reverted too much..

move visibility prompt inside werewolf game's config

d48f71d

add more example games, add elo score calculation

37039de

Refactor ELO tournament system with parallel execution and enhanced l…

f0aa8a9

…ogging

Keyu-He changed the title ~~feat/social-games-elo-eva~~ feat/social-games-elo-eval Dec 9, 2025

Keyu-He added 3 commits December 9, 2025 04:57

Move logging configuration to main entry points

9a6b115

Update run_elo_tournament.py

daca3f4

Improve ELO leaderboard reporting and roster generation

2160f7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat/social-games-elo-eval #342

feat/social-games-elo-eval #342

Uh oh!

Keyu-He commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat/social-games-elo-eval #342

Are you sure you want to change the base?

feat/social-games-elo-eval #342

Uh oh!

Conversation

Keyu-He commented Dec 9, 2025

📑 Description

✅ Checks

ℹ Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant